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By exploiting an analogy between population genetics and statistical mechanics, we 
study the evolution of a polygenic trait under stabilizing selection, mutation, and 
genetic drift. This requires us to track only four macroscopic variables, instead of 
the distribution of all the allele frequencies that influence the trait. These macro- 
scopic variables are the expectations of: the trait mean and its square, the genetic 
variance, and of a measure of heterozygosity, and are derived from a generating 
function that is in turn derived by maximizing an entropy measure. These four 
macroscopics are enough to accurately describe the dynamics of the trait mean and 
of its genetic variance (and in principle of any other quantity) . Unlike previous ap- 
proaches that were based on an infinite series of moments or cumulants, which had 
to be truncated arbitrarily, our calculations provide a well-defined approximation 
procedure. We apply the framework to abrupt and gradual changes in the optimum, 
as well as to changes in the strength of stabilizing selection. Our approximations 
are surprisingly accurate, even for systems with as few as 5 loci. We find that when 
the effects of drift are included, the expected genetic variance is hardly altered by 
directional selection, even though it fluctuates in any particular instance. We also 
find hysteresis, showing that even after averaging over the microscopic variables, 
the macroscopic trajectories retain a memory of the underlying genetic states. 
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1. INTRODUCTION 

Quantitative genetics aims to explain and describe the response of a heritable trait 
to selection, but in the classical approach, dispensing with the genetic details that 
account for that heritability (Lande 1979; Turelli 1988; Lynch and Walsh 1998). 
Yet, population genetics tells us that such responses cannot be predicted without 
knowing those genetic details (Turelli 1988; Barton and Turelli 1987; Burger 2000). 
Paradoxically, we can indeed make such predictions from empirical measurements 
of traits, but only by assuming that the genetic variance (the variance of the trait 
caused by genetic differences) is fixed: changes in variance due to selection have 
been hard to predict (e.g. Hill 1982; Zhang and Hill 2010). 

Directional selection (DS) will usually allow a quick response at the cost of the 
depletion of genetic variance, which will ultimately be due to mutation around a 
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unique optimal genotype ( Burger 2000, ch IV). However, most traits are thought to 
be under some kind of stabilizing selection (SS), in which case the genetic variance 
will be reduced by fixing any one of the many genotypes that match the opti- 
mal value (Wright 1935a; Bulmerl972; Barton 1986). In order to understand the 
full stochastic model, that is a polygenic trait under SS, mutation and drift -the 
goal of this article- we first summarize the theory for infinite populations (Wright 
1935a 1935b; Barton 1986), where random fluctuations are absent, and evolution is 
deterministic. 

For simplicity, we assume that each gene has only two alternative states (termed 
alleles). When mutation (/i) is much weaker than the selection (s) on each allele, 
the trait mean converges to the optimum, because any genetic locus can be almost 
fixed for either allele (i.e., allele frequencies are close to or 1). Thus, there will be 
many stable equilibria for the whole set of loci, at which any genotype that matches 
the optimum is near fixation - there are degenerate genetic states for a trait mean 
matching the optimum. Assuming that there are enough genes (n »1), the genetic 
variance is then roughly Anfi/s (Barton and Turelli 1987). When this variance is 
much greater than the maximum that can be contributed by any one locus (Barton 
1986), its increase due to a substitution at any one locus is much smaller than the 
existing variance, as we expect to be the case for a trait that is influenced by many 
genes. In this regime, there are many sub-optimal stable equilibria, for which the 
genotype that is close to fixation does not quite match the optimum. For example, 
if the optimum is at zero, then with 100 loci, there are Tggjyj ~ 10 29 equilibria that 
match this precisely, but asymmetrical equilibria with 49 '-' and 51 ' + ' loci may 
also be stable. However, selection adjusts the frequencies of rare alleles with these 
asymmetrical combinations, such that the mean is still very close to the optimum, 
but the genetic variance is substantially higher, and the mean fitness is lower as a 
consequence (Barton 1986). 

Imagine that the population is initially at an 'optimal' equilibrium, which 
maximizes mean fitness. If the optimum now shifts, the population will stay near 
fixation for the original genotype, and so the genetic variance will inflate as allele 
frequencies adjust to keep the mean near the optimum. At some point, the equilib- 
rium becomes unstable, one or more loci substitute, and the genetic variance drops 
abruptly. If all alleles have the same effect, as assumed so far, then variance will 
inflate at many loci, and so may become very high (fig.[T]). However, with unequal 
effects, the loci with smallest effect will substitute first, and the overall fluctuation 
will be much smaller. 

If the trait mean lags far behind the current optimum, we would expect to see 
erratic trajectories in an infinite population because of the following two points: 

(i) very rare alleles will increase, and will sweep through at an unpredictable 
time that depends on their initial frequency. 

(ii) The various -and phenotypically equivalent- combinations of alleles are reor- 
ganized when the phenotypic optimum takes a new value, transiently inflating 
the variance, as explained above. 

We envisage three factors that smooth potentially erratic phenotypic paths. 
First, if the effects of alleles vary across loci, there is the opportunity to make 
smaller steps as the optimum changes, which alleviates pronounced and coincident 
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Figure 1. Top row: (A) Allele frequencies, (B) trait mean, and (C) genetic variance plotted 
against time. A population is initially at equilibrium with stabilizing selection s — 0.05 
towards z op t = acting on an additive trait, with n = 100 loci of effect 7=1; the mutation 
rate is /i=0.002 per locus, which maintains a genetic variance of v — 4nfi/s = 16. The 
optimum then shifts abruptly to z opt = 20, and the mean responds almost immediately 
(top, middle). The variance increases abruptly (top right) as ,the allele frequencies at 
all the '-' loci increase substantially (A, lower left region). However, this new state is 
unstable, and slight variations in the initial conditions cause some loci to shift down, and 
some to shift up. As a result, the genetic variance returns to its original value. The lower 
row shows snapshots of allele frequencies at times (D) 0, (E) 800 and (F) 3000 generations. 

changes in allele frequencies, and diminishes the excess variance as the phenotypic 
optimum moves. Second, if we average over all the possible genetic combinations 
that match the phenotypic optimum, then we expect that the speed and direction of 
the response to a change in optimum will be more regular. Third, if the population 
is finite, random drift will on the one hand homogenize the initial distribution of 
allele frequencies, alleviating point i above, and on the other, facilitate transitions 
between 'adaptive peaks', alleviating point ii, so that the population can evolve 
without being trapped at local equilibria, as in Wright's (1931b) 'shifting balance' . 
Also, note that under drift, plus weak mutation, allele frequencies p are distributed 
as <~ [p(l — p)]" 1 , a special case in which the genetic variance stays constant under 
DS (since with this distribution, the increased variance due to rare alleles that be- 
come common is baalnced by the loss of variance from common alleles that become 
rare). Here, rare alleles replenish variance at the same rate as common alleles are 
fixed. 

Measurable quantitative genetic variables are not sufficient to predict response 
to selection: the dynamics of the trait mean and of the genetic variance under selec- 
tion depend on allele frequencies. We can use moments instead, but their dynamics 
depend on higher moments, and we end up requiring as many variables as allele 
frequencies (Barton and Turelli 1987; Burger 1991 1993 2000; Turclli and Barton 
1994). In part, this problem arises from i and ii above. Surprisingly, including a 
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third evolutionary process, random genetic drift, allows us to employ a different 
mathematical toolbox (from physics' statistical mechanics, SM) to make reliable 
predictions. Although we can no longer predict a single trajectory in detail, we can 
instead predict expectations over the ensemble of trajectories. Thus, we will focus 
on the second and third factors mentioned above, namely the averaging over allele 
frequencies that is caused by random genetic drift. 



(a) Statistical Mechanics 

The need to specify the detailed genetic state is avoided by the development of 
an analogy between SM and population genetics (Priigel-Bennet and Shapiro 1994; 
1997; Rattray 1995; Priigel-Bennet 1997; Rogers and Priigel-Bennett 2000). In these 
works, information entropy (Shannon, 1948) was maximized under the assumption 
that any initial state was a priori equally likely. However, this choice of prior is 
not based on population genetics: we know that states are not cquiprobable under 
random genetic drift, because in a finite population, fluctuations bias the neutral 
distribution towards the boundaries. Further work developed this idea in detail, tai- 
loring an entropy measure that ensures that the equilibrium distribution matches 
the stationary solution to the diffusion equation (Barton and de Vladar 2009, Bar- 
ton and Coe 2009). This allows us to collapse an arbitrary number of degrees of 
freedom that cannot be accurately known (e.g. allele or genotype frequencies, cen- 
tral moments, cumulants, etc.) into a few variables, which approximate the whole 
stochastic system. The theory is built in two stages. 

Maximum entropy. First, we observe that a measure of entropy is maximized at 
equilibrium, subject to constraints on the expectations of the macroscopic variables 
(Barton 1989; Aita et al 2004 2005; Sella and Hirsch 2005; Barton and de Vladar 
2009). (Alternatively, we can define a 'free fitness' as the entropy divided by popu- 
lation size, plus the mean fitness, which is maximized at equilibrium under selection 
and drift; this is analogous to free energy in thermodynamics (Iwasa 1988, Barton 
and Coe 2009)). Although entropy is a concept alien to population genetics, it sum- 
marizes two fundamental aspects of the equilibrium distribution. On the one hand, 
it measures the degree of divergence of the distribution of the allele frequencies 
from a neutral base distribution. On the other hand, the entropy couples two sets 
of variables, the microscopic that in this case we take to be the allele frequencies, 
and the macroscopic that are the expectations of quantitative variables. 

Generally speaking, the SM approach takes advantage of genetic drift in two 
ways. To start with, drift is expected to spread the trajectories relaxing the prob- 
lem of trapping at local fitness optima, just as envisaged by Wright (1932). To 
follow, instead of calculating the values of metric characters themselves - these are 
stochastic variables- it makes predictions for their expectations. 

Although we are free to apply any kind of selection, it is crucial that it act only on 
the chosen macroscopic variables. That is, we cannot assume that selection will act 
on specific genotypes, because then, selection could favour arbitrary and improbable 
macroscopic states: the maximum entropy Ansatz would be contradicted, and would 
inevitably fail to give accurate predictions. For example, if we force the population 
to a specific set of allele frequencies, pi, then under directional selection s there 
would be sudden jumps at times <~ (1/s) log (1 jpt ), as alleles reach high frequency. 
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The distribution of allele frequencies -like any statistical distribution- is for- 
mally a function of a set of parameters. In population genetics, it typically depends 
on the selection coefficients, mutation rate, and population size, and the distribution 
of macroscopics can be calculated from these, using the diffusion approximation. 
This averaging over the microscopic variables has the fortunate consequence that 
the distribution of macroscopics, as well as their rates of evolution, do not explic- 
itly depend on the allele frequencies, although their distribution is in fact estimated 
implicitly by maximizing the entropy. This averaging is not straightforward under 
stabilizing selection, because there is epistasis for fitness, which fully couples the 
distribution of allele frequencies. However, the calculations can be greatly simplified 
because under the central limit theorem, the distribution of the macroscopic vari- 
ables, conditioned on a particular value of the trait mean, will be approximately 
Gaussian. This will allow us to eliminate any explicit dependence on the allele 
frequencies, and gives workable expressions for the expectations. 
Quasi-equilibrium. Second, we make a quasi-equilibrium approximation, which as- 
sumes that even during periods of change, the distribution of allele frequencies 
adopts a shape that takes the form of an equilibrium distribution, whose parame- 
ters are chosen so that the distribution matches the current macroscopic variables. 

In this framework, we can describe the state of the population by two comple- 
mentary sets of variables: the expectations of the chosen macroscopics ((z) , (v) . . .), 
and the corresponding selective forces on these macroscopics. This is precisely anal- 
ogous the complementary set of variables (forces and variables of state) in statistical 
mechanics. Rather than working with the rates of evolution of the macroscopic vari- 
ables, we can calculate the rates of change of the evolutionary forces. Consequently, 
we can find the expectations of the quantitative variables at any time, and thus 
know their evolutionary trajectory. If parameters change sufficiently slowly, the en- 
semble will be close to an equilibrium distribution, as with reversible changes to a 
physical system. When parameters change quickly, however, nothing ensures that 
the distribution will be close to some stationary distribution. Thus, the present 
work aims in part to verify this assumption, by comparing the results with numer- 
ical simulations. The mathematical details of this method are explained in detail 
by Barton and de Vladar (2009). 

A crucial advantage of the SM method is that it avoids the recursion to higher 
moments (or cumulants) of the traits, where higher order terms must be neglected 
arbitrarily (Barton and Turelli 1987 1990; Burger 1991 1993 2000; Turelli and Bar- 
ton 1994; Rattray and Shapiro 2001): the key is the choice of an appropriate set of 
macroscopics, which is determined by the set of traits hat determine fitness. 

Previously, we have successfully applied these ideas to the much simpler case of 
DS on a polygenic trait subject to mutation and drift (Barton and de Vladar 2009). 
There, our predictions are accurate for many loci of arbitrary effects. However, 
under DS solving the problem for a trait with n loci amounts essentially to solving 
the problem for a single locus, since there is no coupling between loci. Thus it is to 
some extent, a trivial case. - though not entirely so, because maximum entropy still 
drastically collapses the degrees of freedom from the full allele frequency distribution 
to only two, the trait mean, and the log-heterozygosity (eq. 2.3 ). A more demanding 
situation, and of deep biologically relevance, is SS, which is the main focus of our 
present work. We will study whether the SM method applies to the challenging 
case of SS, mutation, and drift. In particular, we will develop the details using 
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Gaussian SS as a model. We analyze sharp changes in the optimum, a steadily 
moving optimum, and changes in selection on the genetic variance. 

2. STABILIZING SELECTION, MUTATION AND 
GENETIC DRIFT 

(a) Distribution of allele frequencies 

We assume throughout that recombination is fast relative to selection, mutation 
and drift, so that we can assume linkage equilibrium, and describe the population 
by its allele frequencies. We also assume that populations size is not small, so that 
we can use the diffusion approximation. The fitness of an individual with trait 
z is given by log [W z ] — — § (z — z ) 2 , which penalizes deviations from the opti- 
mum z Q , since the extremes are unfit. Notice that this expression can be written as 
— | (z — z) 2 — | (z — z a ) 2 +2| (z — z) (z — z opt ). The first term suggests that indi- 
viduals that deviate from the mean of the population have less fitness. The second 
term is the deviation of the trait mean from the optimum. The mean log fitness 
is thus log[VK] ~ log [W] ~ — |i/ — § (z — z opt ) 2 . Here we neglect terms O (s 2 ), 
and the third term above averages to zero. For further convenience, we expand the 
quadratic term, and express it as log \W] = ai , +f3z — \z 2 + const. This is essentially 
the same, but where selection is acting 'independently' over three quantitative vari- 
ables: selection (of strength a) against the genetic variance, v\ a directional term 
of strength (3 (= — z opt s) regressing the mean trait z to the optimum, and selection 
against deviations from the actual mean (of strength A). Most works consider that 
a = A = —s/2, but here this constraint will be relaxed and allow a and A to dif- 
fer, in order to improve the quasi equilibrium approximations, as will become clear 
later. We can think of examples, such as sexual selection or resource exploitation, 
where the individuals that deviate too much from the population's current mean 
value are unlikely to leave offspring, or selection might act directly on heterozy- 
gotes (and hence directly on v). Of course, in some of these cases, selection might 
be frequency-dependent, in which case we will not have a stationary distribution 
that is of potential form, and the following analyses might not hold. However, we 
will keep these issues aside and concentrate on the simpler case of frequency inde- 
pendent selection, but allowing selection on the variance and on the deviation of 
the trait mean (factors a and A, respectively) to differ. 

We will assume that selection is weak relative to recombination, so that linkage 
equilibrium holds. Then, if the population is moderately large, the evolution of the 
frequency pi of a beneficial allele at each locus can be described by 

dp pg 5 log [W] . . , on 

Tt=Y^p— -M2p-i) + & (2-1) 

where /i is the mutation rate (assumed to be reversible and equal in both directions), 
and £d is the stochastic term due to drift, modelled as a Gaussian of mean zero 
and variance pq/2N where q = 1 — p, and N is the effective size of the population. 
Notice that W depends on z 2 and so couples the dynamics at all loci, something 
that does not occur under DS. In other words, fitness is essentially epistatic. 
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At equilibrium, this process has the joint distribution of allele frequencies, 4>(p) 
(Wright 1931; Kimura 1965; Crow and Kimura 1970, pp. 442): 

, exp - 2NXz 2 + 2Nav + 2N/iU] , 

m mu^, ,2 - 2) 

which is identical to the more familiar expression ip(p) = cW 2N Yli =1 {Pi1i) iN ' l ~ 1 
but arranged in a different way. The quantity Z normalizes the distribution. (From 
now on, we will use bold characters to denote vectors, and script capitals to denote 
matrices). As explained before (Barton 1989; Barton and de Vladar 2009; Barton 
and Coe 2009), we describe the effects of mutation by a potential function U, which 
is essentially the log of the geometric mean heterozygosity: 

n 

C/ = 2]Tlog[ K% ] (2.3) 

i=l 

Thus we have a set of macroscopic variables A = {z,z 2 ,v,U} that is associated 
with the parameters (forces) or={/3, A, a, /i}. Both together completely determine 
the distribution of allele frequencies. At first sight it might seem that specifying 
both z and z 2 is redundant. However, the latter introduces epistasis to the fitness 
function, making it non-multiplicative. But also, since it is their expectations that 
constrain the entropy, both terms arc necessary to couple the microscopic and 
macroscopic variables, and thus to determine the expectations of the mean and the 
variance of the trait. 



(6) Additive -polygenic traits 

So far, we made no assumptions about the trait. The set of variables just defined, 
j4, depend arbitrarily on the allele frequencies. We will assume now some elemen- 
tary properties. Each trait is composed of a pair of alleles (thus we are assuming 
diploids) that contribute additively, each with an effect 7j. The labels X denote 
different alleles (X=0,1). Under this model, the trait, its mean, and its variance in 
the population are respectively 

n 

z = J2li( X i+ X 'i- 1 ) ( 2 -4) 

i=l 

n 

z = 5>i(2Pi-l) (2.5) 

i=l 

n 

v = 2Y jl \ Vi q i (2.6) 

Defined this way, the trait is in the range of ± ^ 7; , and the genetic variance 
is in [0, c max ], with ^ max = | Y^i=i if - ^ we assume that all loci have equal effects 
7, then z = 7X^=1 (^Pl — 1): an d v — 2 7 2 Y^i=iPl<ll- Without loss of generality, we 
will assume 7=1. We can always include 7 7^ 1 by scaling j3 — > 7/3 and a — > -f 2 cr. 

Along this article, we assume that mutation (fi) is weaker than selection (s) 
on each allele, that is \x < sj 2 /4. But at the same time, since we require that the 
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standing genetic variance is grater than the variance due to a substitution, we will 
work in the regime of s7 2 /4 < nfi (Barton 1986), which ensures that there are 
many stable genetic equilibria. 



3. STATISTICAL MECHANICS 



In our previous work, we explained in detail how the distribution in eq. ( 2.2 1 can be 
derived by maximizing a particular entropy measure (Barton and de Vladar 2009). 
In population genetics, this uncertainty measures the possible states in which a 
population can be. It is a functional of the distribution of allele frequencies ip( p). 
We need to employ an information entropy measure which is defined relative to the 
distribution under drift alone (Barton and de Vladar 2009; Barton and Coe 2009). 
That is, in the absence of any other evolutionary factor, random drift would drive 
the evolution of the population towards the distribution^ = Yi?=i (PlQl) > which 
is a U-shaped distribution (Wright 1931). In fact, <f> is not properly a distribution 
because it cannot be normalized. Nevertheless, under drift alone <fi is the solution 
to the diffusion equation (Kimura 1962), and we will set entropy so that with a 
base distribution <f>, it gives the correct stationary distribution with mutation and 
selection. Accordingly, we defined the entropy as 

S(fl>) = J ^lagW^P (3-1) 

When S is maximized with respect to ip we get that ip oc <j). Although (f> will 
diverge when pq^O, imposing constraints on S will ensure that this is not a problem. 
This particular entropy measure was postulated to ensure that the distribution 
of allele frequencies corresponds to the one obtained by the diffusion methods, 



i.e. eq. (2.2 1. However, the probability distribution is changed in the presence of, 
say, mutation and/or selection. Thus, entropy changes under any other factor that 
affects the evolution of a character or of fitness, like selection or mutation, because 
these increase or decrease the uncertainty of the outcomes, biasing the distribution 
To include these effects, we require that the distribution of allele frequencies 
must have specific values for the expectations ( A) . This is introduced in the form of 
constraints to the entropy maximization problem, and thus we introduce a Lagrange 
multiplier for each constraint (A^) — J A^ij)d n p. Thus, we need to maximize the 
following functional (on if)): 

J } J 

Here the third term ensures normalization of the distribution. This leads exactly 



to the distribution of allele frequencies in eq. (2.2), and thus the Lagrange multi- 
pliers are the forces (selective coefficients, and mutation rate). The normalization 
condition is: 



Z = / exp 



md n p, (3.2) 



where we renamed the multiplier exp [— 2Nuq]= Z. This quantity is a function of 
the parameters a.j , ( j ^ 0) and it is analogous to the partition function in statisti- 
cal physics. It plays a fundamental role, because it allows direct calculation of the 
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macroscopics, as well as their cc-variances, since Log[Z] is a generating function: 
d Q log[Z] = 2N(A), and d^ a \og[Z] = 4N 2 C, where C = cov(A,^) is the co- 
variance matrix of the macroscopics. These expectations and covariances are with 
respect to the distribution i/j, that is across possible states of a population, or al- 
ternatively with respect to an ensemble of populations, and must not be confused 
with the means and variances of a trait within a population. The technical effort 
focuses on a workable calculation for Z, which will lead to expressions for the rates 
of change of the a and the (A) . In general, any vector of quantitative variables, 
in our case A, that is a function of the allele frequencies will have an expectation 
(macroscopic) that evolves as 

±(A)=B.a + V (3.3) 

where B is a generalized matrix of genetic covariances, and V the rate of random 
drift; B v] = (ELi ff Tlfc ) and V * = (HU The se formulas can be 

derived straightforwardly in several ways. The most compelling for a biologist is to 
use Wright's (1937) formula, via the diffusion equation (Ewens 1979, pp. 136-137), 



or from the Fokker-Planck equation (for eq. 2.1 Gardiner 2004, ch. 5). Neither V, 
B, nor C (defined above) assume additive or equal effects. However to obtain specific 
expressions we need to make some assumptions. We will now assume additivity, for 
which we get 



V 


2zv 


m 3 


-2z 


2zv 


\z 2 v 


2Z7T13 


-4z 2 


m 3 


2zrri3 


7714 




-2z 


-Az 2 


4 (tWx - v) 


H 



and 



"-(^■-ffi.-^') 



These expressions are valid also under unequal effects. (The unfamiliar macro- 
scopics involved, namely 7773,7774, and H are specified in Appendix 1). Note that 
the SM method provides a way of directly calculating the expectations of these 
quantities, as well as of the higher moments of the trait: they follow from mathe- 
matical manipulations of the generating function Z (Barton and de Vladar 2009) . 
This is one of the most important improvements from the SM method with respect 
to other moments or cumulant methods where these higher order moments have 
to be computed ab initio using their particular differential equations, and approxi- 
mated using specific models, like the Gaussian or House of Cards models (Barton 
and Turelli 1989), perturbation analysis (Rattray and Shapiro 2001), and/or simply 
by truncating the equations at a convenient level (Barton and Turelli 1987; Biirger 
1991 1993 2000; Turelli and Barton 1994; Rogers and Priigel-Bennett 2000). 
Under the quasi-equilibrium approximation, we assume that at every time point 
there is a set of variables a%s (for short, a*) which would give the observed ( A) 

at a steady state, so that — 0; then V"* = — B* ■ a* . Thus, the matrix B in eq. 



(3.3) is approximated by its eq uilib rium expression, but evaluated at a*, denoted 
B* . Then, substituting into eq. (3.3), we get that = B* ■ ( a — al^ ). Since the 
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matrix B* depends only on the a* , and the latter are calculated at every time point 
following the assumption of maximum entropy (see Barton and de Vladar 2009 for 
more details on the implementations) , we are able to make predictions for the long 
term response to selection of the expected trait mean and of the expected genetic 
variance. 



(a) Approximating the generating function. 



The function Z (eq. 3.2 ) is a multidimensional integral over the frequencies at 
the n loci. These integrals are coupled, because the function z 2 is not separable 
into additive terms. Thus, in this section we will focus on an approximation for Z. 
To begin with, we recall that Wright's formula (1937; eq. 39) implies that we can 
always express ip(p) as W 2N x i/j s {, where ip s f = Yl" =1 exp [—ANopiqi] (pift) 4 ^ 1 is 
the distribution of the system free of selection over the trait mean. Thus we include 
the effects of drift, mutation (/i), and also of selection on the genetic variance (a); 
these are all additive terms. Because i/'sf (when normalized) is a product of per- 
locus distributions, the central limit theorem indicates that with many loci, "0sf 
will converge to a Gaussian (Appendix 1). Furthermore, the contribution of each 
locus is typically bimodal, where each peak is close to the borders. Hence, we have 
2 n adaptive peaks, which correspond to all the combinations of ' + '/'-' alleles. In 
order to gain accuracy, we expand as a Gaussian not the whole distribution ?/> s f, but 
rather the distribution around each of the adaptive peaks, by dividing the range 
of each allele frequency into two regions. Under equal effects, there can be m and 
n-m loci near and 1, respectively, weighted by a binomial term (Appendix 1). 
Each peak will be characterized by its expected mean trait values and variances. 
Because of the symmetry under equal effects, at each peak the contribution to 
the expected mean of each allele is equivalent in magnitude, M Ql and of opposite 
sign for the ' + ' and '-' peaks (the trait mean is an odd function of the allele 
frequencies). Thus, for a given combination (m,n-m) the expected trait mean results 
in (z) m = (n—2m)M . Similarly, the variance contributed by each peak is, per locus, 
V , so that for a combination (m,n-m) results in var„ (z) = nV (the variance being 
an even function of the allele frequencies). In the same way, we may include the 
expectation of other quantities (y, U, etc., altogether denoted by a) along with their 
variances and covariances. Defining c rn = {cov m (ak)} k , the covariance matrix of 
a at a given peak with configuration (m,n-m), we approximate ip s { as a multivariate 
Gaussian, namely: 



1 q 

-- (a - (a) m ) ■ c m • (a - (a) m ) 



. (3 ' 6) 

where k is the number of variables included in the Gaussian approximation; in 
practice, we employed k=b variables, namely those required to compute B, which 
are a={z 1 u, 777,3, m 4j H}. Z D is the normalization constant of this Gaussian ap- 
proximation, which for equal effects is Z =^Z ^j , which in turn is the generating 
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function for n independent single loci at an adaptive peak (Appendix 2): 

,1/2 

Z„= / exp[4Acr M ](pg) 4Ar ^- 1 dp. (3.7) 
Jo 

After making this approximation, we need to multiply the resulting Gaussian 
distribution by the effects of selection, as specified by the mean fitness W 2N = 
exp \2N\z 2 + 2N/3z\ , and integrate on z, in order to get Z. Appendix 1 indicates 
the details and the intermediate steps in the derivation (which we work out for 
unequal effects, when the complexity of the calculation allows it). In short we find 
that the generating function is 



£ = , exp 
VI + AN\nV 



2N/3 2 



1 + AN\nV 



to 



m=0 



x 2^ I ~- I ex P 



4-777 77 

(N/3 - 2NX(n - 2m)M ) 



1 + 4N\nV 

(3.8) 

The expectations of the macroscopics follow from the derivatives of Log[Z] w.r.t. 
the parameters a as explained above, all of which can be written explicitly (see 

/ n 

Appendix 1). In practice because n is large, the binomial terms I 

V m 

puted by approximating them by a Gaussian, and the binomial sum by an integral, 
thus: 

Y,y m jF m ~-j= J exp [-2 (to - n/2) 2 /n] F (m) dm (3.9) 

which allows computations that can be handled numerically (Appendix 1). In the 
following sections, we study the accuracy of our approximations. 



4. RESULTS 

(a) Numerical experiments 

Exact analysis of the distribution of allele frequencies is in theory possible using 
a diffusion equation approach (Crow and Kimura 1970, ch. 7). But for many coupled 
loci, as in this case of SS, the time-dependent analytic solution is unknown, and 
its numerical calculation is not feasible -short of Monte Carlo methods that would 
amount to simulation of the original problem. Thus, the possibilities to check our 
method are reduced to either simulating allele frequencies, or to employ individual 
based simulations. We take the first approach, that is to draw multiple realizations 



of eq. (2.1), each giving a stochastic path of the allele frequencies, employing in 
our purposes an Euler scheme with time step At=0.01, and drawing deviates from 
a Gaussian with variance Atpq/2N for the stochastic term modeling genetic drift 
(e.g. Higham 2001). From these paths, we compute the trait mean and its genetic 
variance. Then, we can average over multiple realizations in order to estimate the 
expectations of these variables. We ignore linkage disequilibrium, but know that 
this is negligible for recombination rates r » s (Burger 2000, ch. VI. 7). Thus 
we work entirely from the diffusion approximation to the evolution of allele fre- 
quencies, assuming that selection is weak (s « r,l). Our statistical mechanical 
approach assumes that evolution starts from an initial equilibrium; it does not al- 
low arbitrary initial distributions that would give arbitrary outcomes. Therefore, 
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for a fair comparison, we need to start the numerical realizations also from an equi- 
librium, drawing an initial set of allele frequencies from the theoretical marginal 
distribution. Because the joint distribution of allele frequencies and the product of 
the marginals are not the same, we still allow the process to relax to the stationary 
state for a few thousands of generations (typically, ~ 10 5 ), until numerically we 
obtain that z ~ z opt - Only then we allow selection to change. From this moment, 
when evolution proceeds, we average and record the values to compare them later 
with the SM estimate. 

In order to perform this comparison, we perform a significance test. The null hy- 
pothesis "Ho is that the numerical calculations are not significantly different from 
the SM expectations. Thus to support our theory, we would like to accept Hq. 
Because the numerical expectations are averages over many realizations, these are 
normally distributed. From the SM theory we know the expectation and the vari- 
ance of the macroscopics that we want to test, namely, the trait mean and the 
genetic variance. Summing-up, we know the expected values and the variance and 
thus, ideally the residuals should be normally distributed. Hence, a z-test for the 
goodness of fit is enough for our purposes (Cox and Hinkley 1974, ch. 3). The 

z-statistic is z = J2 T =i \A T ~ C^*)) / var (^*)j where A are the averages over the 
simulations, (A*) is a short-hand notation for (A)^ a -^ (the expectation of A eval- 
uated at the local forces at time r), T is the total number of time points, and 
df=T-l are the degrees of freedom. We arbitrarily set the significance level to 99%. 
In addition, we also compare the maximum observed deviation, Max T (A T — {A*)j 
with the standard deviation from the SM at that same time point. Respectively, 
these two statistics quantify for the SM approximation the accuracy (that is, how 
close are the predictions to the true value) , and the precision (that is, how much 
the observations deviate from its expected value). In turn these two measures give 
us a grip to judge whether our approximation deviates significantly from the 'true' 
values (from the simulations), and also whether significant deviations are biologi- 
cally meaningful. As we will see in some examples, some systematic and statistically 
significant deviations may occur. 



(6) Shifting optimum 

The most radical test of our approximation is when selection changes abruptly. 
For example, a sudden shift of the optimum would trigger a quick response of the 
trait, and a major reconfiguration of the genetic states. The prediction of the change 
of the trait mean and of the genetic variance is thus not a trivial task. In turn, our 
approximation allows us to estimate the change of their expectations, which gives 
a rather robust prediction of their evolutionary course. Figures [2] and [3] present two 
examples of this situation. We compare this response with intensive simulations 



of the Wright-Fisher model (eq. 2.1|, for two distinct numbers of loci: five and a 
hundred. The response of the mean is quicker for traits with more loci. In this 
case of equal effects, the reason is clear: the expectations of the trait mean and 
the genetic variance are ' extensive ' , meaning that their values are proportional 
to the number of loci, n . The expectations are not strictly linear with n , but it 
is nevertheless a positive dependence. Thus, traits composed of a larger number 
of loci have larger genetic variance, and therefore respond faster to selection. This 
is expected from the formula from the Stochastic House of Cards model (SHoC; 
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Burger 2000, pp. 270, eq. 2.8) that predicts that the expected genetic variance will 
be 

2n 7 2 jV/i 
(l/>SHoC = 1 +7 W 

(in this case, we set the effect to 7=1). Thus increasing the number of loci, increases 
the genetic variance. The precision of our approximation does not seem to depend 
critically on n , except for a very low number of loci (n between 3 and 5, depending 
on the parameters, results not shown), where the Gaussian approximation fails. Yet, 
the predictions of the macroscopics are in very good agreement with the numerical 
expectations, even for n as low as 5. 

In these examples, we find that the expected genetic variance remains practi- 
cally unchanged, even though v fluctuates wildly in any one realization. In fact, in 
the two cases in figs. [2] and [3], (u) is variable, but achieves at most an increase of 1%. 
This makes it not only hard but to some extent pointless to aim for an accurate 
numerical or -more critically- empirical estimation of (v) (Fowler and Whitlock 
2002), in part because we would need an unrealistic number of observations. The 
variance of a set of independent (numerical or empirical) measurements of v is of 
comparable magnitude to the mean range of response of such measurements, so any 
change in v will be obfuscated by the effects of drift. 

However, the nearly constant patterns of the expectations of v should not be 
confused with a constancy of v itself. The latter will of course fluctuate around 
its expectation because of genetic drift. Accordingly, we evaluated the deviations 
from the expectations by computing Var(f) (which follows from the derivative 
d 2 log[Z]/ d4:Na 2 , an element of the matrix C), and from it, the root mean square 
deviation (RMSD). The RMSD shows the range in which the individual paths are 
most likely to occur. These bounds are also shown in the figures 3 and 4, exposing 
the fact that the effects of drift are big compared to the change in the expectations, 
but still in a narrow percentile range of the actual standing variation. This indicates 
that in practice v can be considered constant. Such small changes in the genetic 
variance and its expectation justify the classical approaches like the breeder's equa- 
tion, which relies on constant heritabilities for long term predictions (Lande 1979). 
For these kind of situations, a complicated mathematical machinery like the one in- 
troduced here seems unjustified. Nevertheless, we can still apply our calculations in 
order to compute other aspects that don't arise naturally from the classical theory. 
For example, we can easily estimate the expected variance of the set of measure- 
ments alluded above (Fowler and Whitlock 2002). This not only gives a rule of 
thumb of when to expect genetic variance to remain constant, but also provides a 
way to falsify the SM approach. With increasing numbers of loci (a) the range of 
change of v becomes smaller (there is larger mutational load), and (b) the range of 
fluctuation (i.e. the standard error) increases. Thus, we are even less able to detect 
changes in the expectations. 

These calculations give some insight into why (y) hardly varies in response to selec- 
tion. As figs. [2] and [3] suggest, drift would be the main driving force for the changes 
in genetic variance. Why isn't the change in (y) more pronounced? 
Thinking in terms of the distribution of allele frequencies (fig. [1] , showing the 
marginal distribution at one locus), we find that the distribution is bimodal. The 
positions of the adaptive peaks determine the genetic variance, which depends only 
weakly on their height. The relative height of these peaks, on the other hand, de- 
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termine the value of the trait mean, but depend weakly on their position. Thus, 
because these two quantities (z and v) are practically uncoupled, it is possible to 
select over them more or less independently. Selection on the trait mean will cause 
multiple jumps among the peaks (Barton 1986). But the position of the peaks re- 
main unchanged, because they are equidistant to the saddle, which is situated at 
an intermediate frequency p s =l/2 (this is because there is no dominance). Conse- 
quently, because the genetic variance is symmetric w.r.t the allele frequencies, few 
'-' alleles with frequency p (and thus ' + ' alleles with frequency. 1-p) result in 
as much variance as numerous '-' alleles with frequency 1-p (and ' + ' alleles of 
frequency p). So, although the alleles are jumping from one peak to the other, their 
frequencies change always from p to 1-p (or vice- versa) , affecting the proportions of 
' + '/'-' alleles at each peak, and thus the trait, but without affecting the genetic 
variance. However, the intermediate frequencies are transiently populated by the 
jumping alleles, so a slight increase in the genetic variance should be observed. This 
change, as a matter of fact is predicted by the SM but it is generally only a few 
percent of the standard error, and often too small to be detected. Notice however, 
that the force related to the genetic variance (that is, Na) shows a small change. 
But the scale is so narrow that it can practically be considered constant. This is of 
course supported by the SHoC expectation of the genetic variance, eq. (4.1 ), where 
the expectation of the genetic variance does not depend on (z) . 



(c) Effect of the number of loci on the rates of evolution. 

As suggested by comparing figs. [2] and [3j with 5 and 100 loci, respectively, the 
amount of standing genetic variation is proportional to the number of loci n . This 
insinuates that traits composed of more loci should adapt quicker to ecological 
changes. But, is there any relationship between these rates of adaptation and the 
number of loci? As a matter of fact, if we measure time relative to the expected 
number of mutations (that is 2n fit) the time to reach an equilibrium are comparable 
for different number of loci. In this time scale, we find that the genetic variance is 
slowed down, but it also reaches higher levels for traits composed of larger numbers 
of loci (fig. [l] in the electronic supplementary material, ESM 1). In the limit of 
infinite loci, we would have (1) constant genetic variance, but (2) its value would 
diverge, leading to 'instant' adaptation. A solution to this paradoxical limit, comes 
from rescaling the effects of the alleles with the number of loci (Bulmer 1980, Turelli 
and Barton 1990). Thus in making 7 ~ 1 /\/n, the genetic variance is regularized: 
as the number of loci increases, its rate of change go to zero, and it converges to a 
constant value, that is, the infinitesimal model (Bulmer 1980). These circumstances 
are depicted in fig. 2 of the ESM 1. 



(d) Moving optimum 

Another example that was alluded in the introduction and which is of general 
interest is when the optimum is steadily shifting its position (Burger 2005; Waxman 
2005; Kopp and Hermisson 2007; Sato and Waxman 2008; Kopp and Hermisson, 
2009) . Figure [5] presents a comparable situation to that of fig. [3j but where the 
optimum is moving slowly. Again, we do not detect major changes in (v) . Here, the 
trait is being selected in a range that is much less than its total range of response 
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Figure 2. Evolutionary dynamics when the optimum changes abruptly, shifted at time t=0 
from -0.75 to 0.75 (which corresponds to N/3 =-2.5 toiV/?=2.5, see panel C). The trait 
consists of n=5 loci of equal effect. Expectations (black dashed lines) of (A) a polygenic 
trait and (B) its genetic variance. The gray regions cover ± the standard error (root 
mean squared deviation of the variance of the macroscopics) . The change in the genetic 
variance is at most of 4.85% of the initial value, while its standard error is 20.5%. The gray 
solid lines are averages of the numerical realizations (with population size of N = 100; 
1000 replicas were employed). The goodness of fit (one tail chi-square with 51 degrees of 
freedom) accepts the null hypothesis (the simulation points are random samples of the SM 
distributions) in both cases: (A) z — — 0.058, p = 0.68; the maximum deviation is 12.19% 
of the standard error (0.47) ; (B) z — 0.12, p — 0.40; the maximum deviation is 18.27% of 
the standard error (0.31) . (C-F) Evolution of the local forces. Notice the short range of 
change on (D-F), these forces remain practically constant. NX— -1.0, Na=-15 and Nfi=0.3. 

(in, n=100 in this case, while the optimum shifts in a comparably short range, 
between ±5). However, if we consider a larger range, comparable to n (depicted 
in fig. [6]), then at the extremes we force fixation of some alleles, and the variance 
changes significantly. Notice that during evolution, the range of change of the ef- 
fective selective values (what we termed forces) can change notably (fig. [6]). But if 
we compare to other scenarios (e.g. fig. [5]), the forces can be virtually constant. 
Despite the previous examples, the polygenic dynamics when the optimum is steadily 
moving can be very complicated, because as the optimum shifts the allelic combi- 
nations that match the optimum continuously change with dramatic microscopic 
modifications (Kopp and Hermisson 2007 2009). The result is that the path of v is 
erratic in the deterministic case, because the alleles at different loci keep sweeping 
and their frequency changing, inflating and deflating the genetic variance in an al- 
most chaotic fashion. This is all smoothed out when drift is present and we average 
over an ensemble. Thus, a more serious test for our theory would be to consider 
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Figure 3. Evolutionary dynamics when the optimum changes abruptly, shifted at time t=0 
from -5 to 5 (which corresponds to N/3 =±10, that is about 5% of the total range; see panel 
C). The trait consists of n=100 loci of equal effect. Expectations (black dashed lines) of 
(A) a polygenic trait and (B) its genetic variance. The gray regions cover ± the standard 
error (root mean squared deviation of the variance of the macroscopics) . The change in 
the genetic variance is at most of 0.4% of the initial value, while its standard error is 
10.3%. The gray solid lines are averages of the numerical realizations (with N = 100; 500 
replicas were employed). The goodness of fit (151 degrees of freedom) accepts the null 
hypothesis in both cases: (A) z = — 1.14,p = 0.086; the maximum deviation is 26.5% of 
the standard error (0.50) ; (B) z = 0.18,p = 0.03; the maximum deviation is 22.19% of 
the standard error (1.41) . (C-F) Evolution of the local forces. Notice the short range of 
change on (D-F), these forces remain practically constant. iVA=-1.0, Na=-4 and Nfj,=0.5. 
Otherwise as in fig. [2] 



situations where there is considerable chance that the dynamics get stuck in local 
minima. 

(e) Selection on the genetic variance. 

In part, whether our predictions make biological sense, is a matter of time scale. 
In the first examples, genetic drift eliminated genetic variance at a similar rate to 
its production by mutation. Thus, the rate of change of (v) was very small. We 
can obtain a different picture if the genetic variance is forced to change along with, 
or instead of, selecting over the trait. Under stabilizing selection, the mean fitness 
is determined by two variance measures: the genetic variance, and the squared 
deviation of the trait mean w.r.t. the optimum. Both of them are under selection 
with strengths No and NX, respectively. The canonical way to represent stabilizing 
selection (Lynch and Lande 1993) will assume that No = NX = s/2. We relax this 
constraint, and allow selection to act independently over v and over the deviation 
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Figure 4. Marginal distribution of frequencies of an allele. (A) Changing the intensity of 
selection over the trait modulates the height of the adaptive peaks (other things being 
equal, Ner=-3.0) resulting in pronounced changes in the expectation of the trait mean but 
with weak changes (even a relatively constant value) of the expectation of the genetic vari- 
ance. Solid line: N/3=2.5, (z) = 1.21, (u) = 1.12. Dashed line: N/3=0, (z) = 0, (v) = 1.13. 
Dotted line: N/3=-2.5, (z) = -1.21, (v) = 1.12. (B) Changing the intensity of selection 
over the genetic variance modulates the position of the adaptive peaks (other things being 
equal, N/3— 2.5) resulting in pronounced changes in the expectation of the genetic vari- 
ance, but with weak changes on the expectation of the trait mean. Dotted line: N<r=0, 
(z) = 1.18, (v) = 8.04. Short-dashed line : Ncr=-1.0, (z) = 1.19, (v) = 7.52. Large-dashed 
line: Ncr=-4.0, (z) = 1.22, (u) = 5.32. Solid line: : N<r= -10.0, (z) = 1.36, (u) = 2.20. In all 
cases N/x=0.5, NA=-1.0, and the trait is composed of 20 loci of equal effects. 



from the optimum. At the moment, we focus on the first alternative. 
The scenario that we study, is that when a population is initially under selection- 
mutation-drift equilibrium where selection acts over a quantitative trait, but with 
little strength on the variance. Suddenly, selection against the variance increases, 
and the optimum changes sign. This implies a radical reconfiguration of the adaptive 
landscape. In this case, will the SM approximation be accurate? An example of this 
scenario is demonstrated in fig. [7] . We find that there is a good agreement with the 
simulations. 

As explained above, selection over the trait mean and over the genetic variance are 
practically uncoupled. Thus if the optimum moves without affecting the strength of 
selection, the trait mean can freely evolve without significant changes on the genetic 
variance (figs.[2j[3j[5|. However, we have now studied a more general situation when 
both forces, the optimum and the selection strength, change. Figure [7] shows that 
the pattern of change of the force Nj3* is similar to the one in fig. [3] where a 
similar situation was studied, except that the genetic variance did not change. This 
suggest that even in this case (fig. [7]) where both selection and the optimum shift, 
the evolution of the genetic variance and of the trait mean are uncoupled. This 
is unexpected because, unlike the previous example, the adaptive peaks are being 
relocated. This rearrangement of the microscopic space, which originally was nearly 
peaked close to 1/2 but which suddenly became bi- modal, partitioned the occurring 
alleles closer to the borders of fixation -these configurations that diminish the 
genetic variance. Simultaneously, a high frequency of ' + ' alleles was produced, and 
at another locus a high frequency of '-' alleles, in such proportions that maintain 
low genetic variance, while allowing the trait mean evolve to increase to its new 
optimum value. 

Under the quasi-equilibrium approximation, the SHoC formula (eq. 4.1) would hold 
for any time if we would substitute Njj, and Na for the effective forces Nfi* and Na* . 
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Figure 5. Evolutionary dynamics when the optimum moves gradually from -5 to 5 (which 
corresponds to N/3 =±10, see panel C, dotted line) at a rate of 510~ 3 units per generation. 
The trait consists of n=100 loci of equal effect. Expectations (black dashed lines) of (A) a 
polygenic trait and (B) its genetic variance. The gray regions cover ± the standard error 
(root mean squared deviation of the variance of the macroscopics) . The gray solid lines 
are averages of the numerical realizations (withiV = 100; 500 replicas were employed). 
The goodness of fit (151 degrees of freedom) rejects the null hypothesis for the trait mean, 
and accepts it for the genetic variance: (A) z = — 0.21,p = 0.009; however, the maximum 
deviation is 37.31% of the standard error (0.50); (B) z — 0.067, p = 0.41; the maximum 
deviation isl5.94% of the standard error (1.41). (C-F) Evolution of the local forces. Notice 
the short range of change on (D-F), these forces remain practically constant. NX = -1, Na 
= -4, Nfx = 0.5. Otherwise as in fig. [2] 



Thus the expectation of the genetic variance would only change if any of these 
change. If the optimum value is displaced without significantly changing Nfi* or 
Na* , there would be no reason to expect a change in (^)shoC- A marginal issue here, 



is whether the predictions from cq. (4.1 ) are in agreement with the SM expectations. 
For example, for the initial state depicted in fig. [7] , the numerical averages (from 
the Wright-Fisher simulations) for the genetic variance give £\yF =31.13 ± 1.65, 
the SHoC gives (<v) SH oC = 30.00, whilst the SM gives (z/) SM = 31.08 ±1.5. For the 
end equilibrium, we have i>wf =14.08±1.36 , and the models give (iv)shoC = 10.00, 
whilst (^)sm = 14.12±1.21. There seems to be a decent agreement between both 
predictions. However, the agreement between the SHoC and the SM expectations 
does diverge as Na — >0 (data not shown). 
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Figure 6. Evolutionary dynamics when the optimum moves gradually from -12 to 12 (which 
corresponds to N/3 =±48, 60% of the total range; see panel C, dotted line) at a rate of 510 -3 
units per generation. The trait consists of n=20 loci of equal effect. Expectations (black 
dashed lines) of (A) a polygenic trait and (B) its genetic variance. The gray regions cover 
± the standard error (root mean squared deviation of the variance of the macroscopics). 
The gray solid lines are averages of the numerical realizations (with N = 100; 500 replicas 
were employed). The goodness of fit (101 degrees of freedom) rejects the null hypothesis 
for the trait mean, and accepts it for the genetic variance: (A) z = — 0.64,p < 10~ 9 ; 
the maximum deviation is 171% of the standard error (0.36); (B) z = 0.06, p = 0.54; 
the maximum deviation is 41% of the standard error (0.69). (C-F) Evolution of the local 
forces. Notice the short range of change on (D-F), these forces remain practically constant. 
NX = -2, No = -2, Nfi = 0.5. Otherwise as in fig. [2] 



(/) Further dynamical modes 

Quantitative traits, and the genetic variance, under stabilizing selection can also 
evolve by other means than selecting over the variance, and changing the position 
of the optimum. The statistical mechanics allows z and v to vary by changes in NX. 
Recalling that this parameter penalizes the deviations of the trait mean from the 
optimum, varying it should not affect the value of z. In fact, if we keep the optimum 
at a fixed place, and say, we reduce NX, the (z) and (u) remain almost unchanged. 
Instead, there is an increase in the standard deviation of z. Thus this parameter 
NX determines how much z deviates from its the expected values. Surprisingly, the 
marginal distribution of allele frequencies remains practically unchanged. Despite 
that an adaptive peak is not the same as the marginal distribution of an allele, 
the mean and variance at each adaptive peak do not depend on NX (see appendix 
2), thus altering its value will not affect the positioning and relative heights of the 
adaptive peaks. However, NX would make the distribution more or less concentrated 
at such peaks. Thus although the expectations remain the same, populations with 

Article submitted to Royal Society 



20 



H.P. de Vladar and N.H. Barton 




Figure 7. Evolutionary dynamics when the optimum changes abruptly, shifted at time t=0 
from -5 to 5 (which corresponds to N/3 =±10, see panel C), and simultaneously, selection 
against the genetic variance is increased from Nu—-1 to Na—-5. The trait consists of 
n=100 loci of equal effect. Expectations (black dashed lines) of (A) a polygenic trait and 
(B) its genetic variance. The gray regions cover ± the standard error (root mean squared 
deviation of the variance of the macroscopics). The change in the genetic variance is at 
most of 0.4% of the initial value, while its standard error is 10.3%. The gray solid lines are 
averages of the numerical realizations (with N = 100; 500 replicas were employed). The 
goodness of fit (151 degrees of freedom) accepts the null hypothesis for the trait mean, 
but rejects it for the genetic variance: (A) z = — 0.011, p = 0.87; the maximum deviation 
is 17.19% of the standard error (0.50); (B) z = 0.28,;p < 10" 4 ; the maximum deviation is 
39.77% of the standard error (1.37). (C-F) Evolution of the local forces. Notice the short 
range of change on (D-F), these forces remain practically constant. NX =-1, Nu = 0.3. 
Otherwise as in fig. [2] 



larger NX would show population means and variances more scattered than other 
population (or states) where NX is smaller (for numerical examples, see figs. 4 and 
5 of the ESM 1). 

Another way in which populations can evolve, is by changing the mutation rate. 
An increase (decrease) in the number of mutants will inflate (deflate) de expected 
genetic variance, but will not affect the trait mean's expectation. However, if the 
mutation rate approaches the /i = 1/4/V®, the SM method is inaccurate (this will 
be discussed in more detail in §4.7) . 

At such point, the borders of the distribution (i.e. p=0,l) become absorbing, 
leading to a failure of the local equilibrium dynamics (figs. 6 and 7 of the ESM 1). 
Close to the boundaries, two effects are present: (1) drift is much more powerful 
than selection in fixating alleles, and (2), mutation does not produce enough alleles 
to keep the distribution away from fixation. Towards the center of the distribution, 
selection is more effective than drift in maintaining polymorphisms. The rate at 
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which each process occurs have different time scales: fixation towards at the borders 
happen very fast, whereas diffusion away from the borders is much slower. The local 
equilibrium dynamics cannot cope with both time scales at the same time, and the 
SM approximation fails (Barton and de Vladar 2009). 

(<?) Some negative results: failure of the SM approximation. 

Failure at low mutation rates. In our previous paper, where we dealt with direc- 
tional selection (Barton and de Vladar 2009), the SM framework had to be modified 
for low mutation rates (Nfi <l/4). At these low mutation rates, there is a major 
qualitative change in the distribution of allele frequencies, since the fixation states 
suddenly become absorbing. This is also true in the present case of stabilizing se- 
lection. Thus, when mutation rates are close to this critical pointiV/z=l/4, the local 
equilibrium underestimates the rates of change (e.g. the genetic variance), and the 
accuracy of our predictions is lost. In the ESM 1 (Sect. 5), we show two examples, 
with the optimum abruptly increased and for a moving optimum, where the pre- 
dictions deviate significantly. 

When N/i <l/4 the macroscopics that characterize the system change, in what 
is called a phase transition. In this case, the genetic heterozygosity [/, has to be 
dropped, and the dynamics of the other macroscopics must be computed from the 
jump processes (using a master equation, instead of diffusion; Barton and de Vladar 
2009). This works well for directional selection, but would lead to a different set 
of equations for the local variables and for the macroscopics compared with the 
high mutational input case. Nevertheless, the philosophy is the same: to maximize 
entropy under the constraint of the appropriate macroscopics, and approximate the 
dynamics using the local equilibrium Ansatz. 

Failure when selection on the genetic variance is strong. We also found that when 
selection on the genetic variance is very strong, 4| Na\ »1, we also lose accuracy. In 
the ESM 1, we illustrate this situation for abrupt changes and gradual moves in the 
optimum trait value. The reason for the failure seems to be that when the valleys 
between the adaptive peaks are deep, the microscopic distribution approaches a 
stationary distribution very slowly, so that our central assumption that tp is nearly 
stationary, fails. Again, a jump process seems to be a more accurate model than 
diffusion, which we are eager to apply for stabilizing selection. We will come back 
to these explanations later. 

(h) Hysteresis 

The quasi-equilibrium assumption will hold when the changes in the micro- 
scopic states are faster than the macroscopic changes. Because of the complexity 
of the adaptive landscape, a given macroscopic state can stand at different adap- 
tive peaks. Although these peaks may be indistinguishable from the macroscopic 
point of view, they will have an effect on the rates of change of the latter. Because 
the transitions among different adaptive peaks may in general happen at different 
rates, then the final macroscopic state - after a peak shift - will depend on the 
particular microscopic configuration at the time of the jump. This is remarkable, 
because the stationary distribution is unique, and is determined solely by (A). An 
immediate implication is that the trajectory that the macroscopics take during evo- 
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lution, depends on their previous history. That is, the dynamics show hysteresis. 
Thus, even when the microscopic variables are averaged out, there can be memory 
in the macroscopic dynamics. 

In fact, we find that, out of equilibrium, a particular value of (z) does not always 
correspond to a unique value of (u) , which is the signature of hysteresis. Figure 
[8]A- shows an example where a trait evolves starting at the point a, following an 
optimum that increases towards the point b. After equilibrium has been reached, 
the optimum then moves back to the original value a; the other parameters are left 
unchanged. The trajectories in the space (z) , (y) are not the same for the forward 
and backwards processes. Although the predictions and the simulations do not agree 
(clearly, we are in the range where the adaptive peaks are well separated) , both, the 
actual model and the SM approximation show an equivalent qualitative response. 
In fig. [8j3 the optimum moves slowly (as in fig. [5] ) and afterwards, moves back to 
its original point. Because the optimum follows the same trajectory forward and 
backwards, the amount of hysteresis lower. Finally, in fig. |8p both the optimum 
and selection against the genetic variance change abruptly (as in fig. [7]), and af- 
ter equilibrium is attained, they abruptly change back to their original values. The 
rates of change of (z) and (v) have different time scales, and hysteresis is enormous. 
The puzzling issue here is, why after averaging over the microstates, do the trajecto- 
ries of the expectations show hysteresis? The question seems be difficult to answer 
in detailed, mechanistic terms. But, we can understand it from the macroscopic 
phenomenology. As we showed and discussed above, the trait mean and the genetic 
variance can evolve more or less independently. Depending on the strength of the 
selection coefficients, which are the directional component towards the optimum - 
Nj3- and the strength of selection again the genetic variance -Ncr-, the response will 
be quicker for one component or for the other (depending which one is effectively 
bigger). Thus the macroscopic that changes faster in one direction, will be also the 
one that changes faster in the other direction. For example, in Fig (9A) the opti- 
mum shifts. The standing genetic variance initially increases as the rare mutants 
become frequent and the trait mean increases. Eventually, this variance is rapidly 
consumed as the trait mean slows down relaxing to equilibrium. If we now consider 
the reverse process, shifting the optimum back to its original position, the story for 
the variance will be exactly the same: initial mutants will increase their representa- 
tion, and the genetic variance increases. But although the genetic variance is taking 
the same path, the trait mean takes a mirror path: before it was characterized by a 
period of quick increase, followed by a slow increase. Now it takes a quick decrease 
followed by a slow decrease. 
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Figure 8. Hysteresis on the evolutionary dynamics of a trait that consists of n=100 loci 
of equal effect. (A) Abrupt change in the optimum (iV/3=±12, Na=-15, NX=-1, N/j,=0.7). 
(B) Moving optimum {N/3 =±10, NX = -f, Na = -4, Nfi = 0.5; as in fig.^. (C) Abrupt 
change in the optimum and selection against the genetic variance (A r /3=±12, Ncr=-1 — >• 
-5, NX =-1, Nfj, = 0.3 ; as in 

fig.0- The arrows indicate the direction of evolution, which 
start at the points s, ending at the points e, and then backwards, that is with the forces 
that changed are switched back. The black lines, are the SM predictions, and the gray 
lines, averages from the simulations (population size iV = 100; averages over 500 replicas). 
In all cases, the paths on both directions do not overlap, showing that the macroscopic 
states depend on the path. 
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5. DISCUSSION 

The main question that we have addressed in this work is whether the SM ap- 
proximation is valid in the challenging case of stabilizing selection. In general, the 
maximum entropy distribution has been shown to match that of the exact model 
(the diffusion equation) at equilibrium, and it is straightforward to determine which 
macroscopics are -in principle- relevant for evolutionary change (Appendix 2 in 
Barton and de Vladar 2009). This extension of our previous methodology has been 
applied to an important and classical theme in population and quantitative genetics: 
understanding the factors that generate quantitative variation and its evolution. In 
the introduction, we posed the problem which refers to the difficulties that appear 
in estimating the change of a complex trait. We also proposed three ways around 
to these complications, namely unequal allelic effects, averaging over the different 
deterministic paths, and genetic drift. We have taken only the last two, for clarity, 
and assume equal effects to give a worst case scenario where microscopic irregu- 
larities are enhanced. In fact, an extension of the SM method to unequal effects 
involves a slightly different version of the generating function and per-locus equa- 
tions (see Appendixes 1-2). The approximations follow from the application of the 
central limit theorem, which docs not require the same distribution at all loci, only 
independence between these loci (see Appendix 1). 

We have been able to predict long term changes of the expectations of the mean and 
variance of a trait, in the extreme situations when selection either changes abruptly, 
or gradually due to a moving optimum. In each case the expected trait mean follows 
radically different paths, yet the expected genetic variance stays roughly constant. 
Unless there are changes in the selection against the genetic variance, its expec- 
tation seldom changes appreciably. Therefore, classical approaches of quantitative 
genetics, like the breeder's equation (Landc 1979; Falconer 1981, ch. 20), where the 
constancy of the additive genetic variance is assumed, seem to describe fairly well 
the evolution of the trait mean, since the fluctuations due to drift do not seem to 
affect critically this constancy of the expected genetic variance. 
In our studies of the dynamics, the effective forces other than selection towards 
the optimum (N/3) are weak. This reflects the general principle that selection on a 
polygenic trait is much more effective on the mean trait than on the variance, by 
a factor equal to the number of loci. In turn, this is exposed by the local variables, 
which remain very similar to the actual forces, i.e. Na* <~ Na, and A/ju* ~ Nfi, 
and may be related to several factors. First, if the range in which the optimum is 
changing is narrow (compared with the whole range of response of the trait, ±n 
, as most traits seem to be), the alleles are unlikely to fix. Even if they have fre- 
quencies close to the borders (e.g. if selection against the variance is strong), the 
variability that is generated by mutation is not reduced through fixation (because 
we assume Nfi >l/4). Second, because mutations are frequent, the jumps between 
peaks provide a mechanism for the change of allele frequencies at an appreciable 
rate. This is a mechanism which essentially, maintains the genetic variance. Thus, 
selection on the trait mean can act without affecting the mechanisms that maintain 
variance. Selection can bias the rates between the peaks in one or the other direc- 
tion, affecting the trait mean. But again, this does not affect the heterozygosity 
of the population. This allows the character to be displaced without changing the 
genetic variance. 
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However, if we select the trait over a wide range, say starting and ending at extreme 
trait values, we would force fixation of some of the alleles and observe a reduction in 
the genetic variance toward the extremes, but with a transient inflation in-between. 
In this case, most mutants appearing in the initial environment are rare, and there- 
fore the variance is small. When selection changes, these mutants become beneficial, 
and therefore their frequency increases (as has been experimentally documented, 
see Hoffmann and Merila 1999). Eventually, when we approach the other extreme, 
the ' good ' alleles fix, reducing the variance, as shown in fig. [6] . 
Although we have shown that the SM approximation works well in a range of situa- 
tions, it fails is when the adaptive peaks are well separated (that is when Nfi <I/4 
and/or selection against the genetic variance is strong, 4|A r cr|'7 2 »l; see fig. |4]B). 
When the distribution becomes peaked close to or at the boundaries, the alleles 
quickly approach fixation. In this case some of the mutants close to an adaptive 
peak are forced to jump to a beneficial peak. This process seems to be much quicker 
than the diffusion times, and therefore the SM predicts a slower response of the ge- 
netic variance. The predictions for the trait mean, however, remain accurate. But if 
the population remains in a regime where there is significant intermediate frequency, 
the SM method works well. 

To summarize, in the light of our theory, we attribute the constancy of the ge- 
netic variance to three factors. First, weak selection on the genetic variance. The 
effective forces on the genetic variance show weak changes, because although the 
new beneficial phenotypes require a reconfiguration of the genetic states, these can 
be attained by changing only the proportion of alleles at different loci which are 
at the different adaptive peaks, rather than a shift on the position of these peaks. 
Second, the presence of genetic drift. Here, we should also note that although the 
expectations smoothly change, an individual trajectory will nevertheless fluctuate 
abruptly. The bounds presented in the above figures indicate the range of these 
fluctuations. In some occasions, even when the changes in the trait mean or genetic 
variance are large, these bounds tend to be quite narrow. Therefore even in indi- 
vidual experiments we would expect a regular (i.e. not erratic) pattern of change 
in the trait mean and genetic variance (seen in reality, e.g. Barton and Keightley 
2002, fig. 2). Third, the number of loci composing a trait increases the genetic load, 
and thus diminishes the rate of change of the expectations of the genetic variance 
(assuming that selection is on the trait mean). Although this effect is notable in 
the expectations (figs. [2] and [3] ), it seems to be less critical than the above two, 
as indicated in our theoretical experiments where selection on the genetic variance 
induces notable changes (fig. [7]). 

The above results are in sharp contrast to a ' general ' situation where changes 
in the genetic variance are not smooth. The implication is that drift is a mecha- 
nism that regularizes the course taken by an ensemble of populations through the 
rugged fitness landscape. Thus, averaging over these drift effects helps us to under- 
stand the evolution of quantitative traits. Yet, there is no microscopic hysteresis 
in our results. We found, curiously, that hysteresis is present in the macroscopic 
trajectories. This phenomenon results from the combination of two factors. The 
first is that the genetic variance and the trait mean can evolve independently, each 
with their own characteristic time scale. The second is that the trajectory of the 
expectation of the genetic variance is symmetric in time, while that of the trait 
mean is antisymmetric, meaning that in the backwards dynamics, the trajectory is 
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a mirror image of the forward dynamics. Together, these two factors show not only 
that genetic systems have memory, but that to 'degrade' such memory, selection 
in the contrary direction might be required. Otherwise, the system may get stuck 
into local optima, especially for low mutation rates, where hysteresis is expected to 
be enhanced. This last statement is so far hypothesis, which needs to be verified 
with further analyses. 



6. CONCLUSIONS AND PERSPECTIVES FOR FUTURE 

RESEARCH 

Our approximation is a step forward in the understanding of the evolution of poly- 
genic traits: it allows us to unravel features using only a few macroscopic measures. 
Still, traits rarely evolve independently from others. In this sense, there are two 
plausible extensions to our present work. The first is the explicit coevolution of two 
or more traits. As in the univariate case, the rate of change of a trait is proportional 
to the genetic covariances, which are subsumed in the C?-matrix (Lande 1979). The 
natural question is then how quickly can this matrix respond to selection, mutation, 
and drift (Steppan et al 2002; Blows 2007). Our method can address this question; 
it is straightforward to formulate the problem by defining a vector of traits ( z). 
This would then define a multivariate trait version of the matrices B and C, from 
which we can in principle forecast the evolutionary dynamics responding to simul- 
taneous selection over the different traits. A related problem that we envision is 
that of pleiotropic effects. Changes in the value of a trait are often accompanied by 
a change in the fitness of the individual, not only because of its direct effects, but 
also because many other traits are indirectly being affected (Turelli 1985; Barton 
1990; Wagner et al 2008). In this situation, the net effect of the mutations tends 
to be deleterious. This phenomenon can be quantified by a variable that accounts 
for all these pleiotropic factors through their net effects on fitness (Turelli 1985; 
Barton 1990; Keightley and Hill 1990; Kondrashov and Turelli 1992; Gavrilets and 
Dc Jong 1993; Orr 2000). 

Population genetics still faces the challenge of explaining interactions among dif- 
ferent factors that have uncertain consequences for the evolutionary process. The 
SM method, although relying on a moderately complicated mathematical machin- 
ery, allows us to systematize several of these factors and to study their effects in 
a relatively easy way. It is an ideal companion to diffusion methods, and although 
developed in part as an analogy with SM in physics (Barton and de Vladar 2009; 
Barton and Coe 2009), it has the same ultimate goals as population and quanti- 
tative genetics: to inquire on the factors that produce and maintain variability in 
genetically complex traits. 

The authors would like to thank J. Polechova and F. Palero for comments and discussions. 
This project was supported by the ERC-2009-AdG Grant for project 250152 SELECTION- 
INFORMATION. 
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Appendix A. Gaussian approximation for the generating 

function 

Following Wright (1937), wc can partition the distribution of allele frequencies, and 
thus the integrand of Z, into the elements of W 2N that are associated with the trait 
mean, and associate those elements which are not, namely: 

Z = J exp [2Nf3z - 2N\z 2 ] exp[2Nau + 2N^U]<j)d n p . (A 1) 

Now, notice that the second exponential term is separable as a product over 
loci: Exp[2 Nov + 2 Nfi U] <j> =U7=i e 4 ^""' {PiQif^' 1 • If the allele frequencies 
are not too close to fixation, then the last product (properly normalized) can be 
approximated by a multivariate Gaussian distribution, centered on the expectation 
of a given set of macroscopics. We choose these to be: z, v, m 3 , to 4 and H which 
are defined by 



(m 3 ) = 2 if (1 _ ^Pi)Piqij (third moment of the trait XA 2) 
(mi) — 2 (^^Ji (1 — 2pi) 2 PiQi^j (fourth moment of the trait)(A3) 

(H) = 2 (P**)" 1 ^ — (genetic variance of the log - heterozygosity )(A 4) 

These are required to compute the matrices B and C. The variances and crossed 
moments follow directly. 

Before applying the Gaussian approximation, it is convenient to partition the dis- 
tribution of each locus around each adaptive peak. For instance, calling = 
e 4Ar<T7 pq^yNfL-i density around the peak ' + ' peak (close to p=l), and 
similarly k_ for the '-' peak (close to p—0), where Z D is their normalization con- 
stant (see below), then the product over loci can be expressed as a binomial sum 
over the pair of peaks of each locus 

n n I \ 

He iN ^^(p iqi ) 4N ^- i =z i[o=z j2\ n • (A5) 

i=l i=l keK \iek,jek c J 

The index k runs in the all the combinations of loci at each peak. For example 
all loci at the '-' peak wold give k={l,2,. . .,n }, and its complement would be 
k c = {}, an empty set. If loci 1,2,4,5 are in the - peak and the rest in the + peak, 
then k={l,2,4,5} and k c = {3, 6, 7, 8, . . . , n}. Thus K contains all these possible 
permutations. Now we apply the Gaussian approximation (central limit theorem) 
to the product riiekjek c K i~ K j+i which leads to 



i=l [Z7T > keK 



1 -11 
--(a- (a) k ) ■ c k -(a-(a)k) 

(A 6) 
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and here, the vector a= {z, v, 1TI3, 1114, H}, a T is its column transpose, and c k is its 
covariance matrix (its determinant denoted by det (cfe)). The subscript 'k' indicates 
that the expectations and covariances are with respect to the distribution at the 
adaptive peak of the particular configuration k, and which is free of selection over 
the trait. The normalization constant is: 



n pl/2 

%o = T[ e 



iN(T^j i pq 



(pq) 



4JV/1-1 



dp = f[Z. 



(i) 



(A 7) 



We employed the additive property of the functions z, v and U, to separate the 
integrals, so that the 7L s are characteristic functions of per-locus systems subject 
to mutation, drift and selection acting over the genetic variance, and therefore 
depend on the variables Na and Nfj, only: 



Z* = V^4- 4Ar ' J r(4iV M ) lJ F 1 



(A 8) 



(r is the gamma function, and \F\ is the regularized confidential hypergeometric 
function; Abramowitz & Stegun 1972, ch. 13). The other statistics for the adaptive 
peaks are analyzed in Appendix 2. Now, introducing eq. (A6) into eq. (All: 



Z = 



(2tt) 5 /2 



det (c k ) 1/2 I exp [2N/3z - 2NXz 2 ] xexp 

keK J 



1 



(a - (a) k ) • c k 1 • (a - (a) k ) 
(A 9) 



In this integral we should consider a finite range of integration running from the 
two extremes of each macroscopic, a m - m to a max . But for many loci, the density is 
concentrated near the expectations, thus performing the integral in the whole range 
is a good approximation. This of course is true as long as the expectations are far 
enough from the borders (see ESM 1). Here all the variables except z integrate to 
1, and we get 



da . 



exp 

keK 



Qk 



(N/3 2 Vt + N0Mk ~ A^AM k 2 ) 



(A 10) 



where 



Q k = 1 - 



The expectation of the trait mean at each peak M k and its variance V^are given 
in Appendix 2. 



The expression (A 10) is not easy to deal with because the sum is performed 



over a vast number of terms (all the possible combinations, which amount to 2"). 
One way to make progress is assuming equal effects. In this case, the sum becomes a 



binomial (see for example in eq. A 5 that the sum will directly be a binomial sum). 



Many of the combinations are equivalent, because all that matters is the number 
of loci in one peak, or in the other. Each combination is weighted by the binomial 
n 



term, denoted by 



A number m of loci standing in a peak 



1 m!(n— m!) ' 

contribute to the expected mean by mM Q , where M a is the expected trait mean 
of a single locus. The other peak will house n-m loci. However, the mean M Q at 
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the two peaks have equal value but opposite signs (by convention we take the value 
at the ' + ' peak) , thus its net contribution to the expectation of the trait mean is 
(to — n)M . Adding the contribution of both adaptive peak with a configuration of 
( n-m,m), the expectation of the mean is thus (2m — n)M . The expected variance, 
however, is invariant with respect to the ' + '/'-' peaks. It only depends on the 
deviation with respect to the mean value at a peak, which for both peaks is the 
same, to which each locus contributes by V . The ' + ' peak contributes by m and 
the '-' peak by n-m. Overall, the total variance is nV . Thus eq. (A 10 1 simplifies, 
for equal effects, to: 



VQ 



m— 



n 
m 



xexp 



Q 



(Np 2 nV D - N(3{n - 2m)M - NX(n - 2m) 2 M Q 2 ) 



< A11 > 

Although it is feasible to work with this expression it is convenient to approx- 
imate the binomial term as a Gaussian. Since the exponential factor of the sum is 
also of quadratic order on m, we can obtain a closed expression for the whole sum. 

Thus approximating ^ n ^ ~ 2 n (7m) -1 / 2 exp [—2 (to — n/2) 2 / n\ , we find that 



eq. (All) becomes 



Z 



2Z, 



\f~ivnL 



^ exp [—2 (to — n/2) 2 / n\ xexp 



m— 



- (N(3 2 nV - NP(n - 2m)M - NX(n - 2m) 2 M 2 ) 



'A 12) 

Now, we replace the sum by an integral over m in the whole real range (-00,00), 
and solve it to get 



27L 



R 



■exp [2Np 2 n (V Q + M 2 ) / R] 



(A 13) 



where 



R = 1 - ANXn (V + Ml) 



We would like to remind that NX is always negative, thus the factors Q and 
R remains positive for any choice of parameters. 

Although it is tempting to employ the last equation to calculate the macroscop- 
ics, this in most cases it will lead to substantial errors. The correct method is to 



compute the derivatives from eq. (A 12) and then perform the Gaussian approxi- 
mation over the sums of such result. 



Appendix B. Per-locus statistical mechanics 

The expectations for the macroscopic variables above, are expressed in terms of the 
contributions by the different permutations of the loci around each adaptive peak, 
which we expressed as a Gaussian function. These contributions are also described 
with a statistical mechanical approximation, but for a polygenic system where se- 
lection acts only over the genetic variance, and where mutation and drift affect the 
population's genetic structure. This system has its own generating function, given 
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by eqns. (A7| and (A8), from which the expectations and covariances are calcu- 
lated. The first two expectations that we require for the full system of stabilizing 
selection are those of the trait mean and its variance at the adaptive peaks, that is 
Mic and Vjj, which are given by: 

M-l-A Vfai ^ 1^(4^4^ + 1;^) mi , 

M * = {Z) k = 2^ l 21 k(m) - l ) g r a at.. .ai^t.. i 1 /o. A7— ,2 ^ ^ ^ 



ro=0 



L F 1 (4A^;4A^+l/2;iVa7 r 2 J 



where lnj(m) is an indicator function, which gives 1 if the locus m belongs to k and 
otherwise. The variance is T4 = textVar^z = (z 2 \ 



(z)l 



Ag 2 v = y 7^ i^i (4^i 4iV/x + 3/2; iVfrygJ 
k i 2 1 F,(%% + l/2;* 7 i) 



Similarly, other quantities can be computed from the eqns. (A 7) and (A8 by 
direct integration or by taking derivatives w.r.t. N/sigma, to obtain the expectation 
of the genetic variance for a given configuration across the peaks, or w.r.t. Nfi 
to obtain the expectation of the genetic variability, U. And similarly, the second 
derivatives to compute the variances and covariances. With this method we can 
obtain the expressions for all the macroscopic variables needed to compute the 
dynamics of the fully coupled polygenic system. Under the assumption of equal 
effects, all the adaptive peaks are equivalent. Thus the product of the per-locus 



generating functions (eq A 7 ) reduces to 

7Lo=K (B3) 
and consequently all the macroscopic variables simplify. Under equal effects, eqns. 



(Bll and (B2) become 



M = jjjn - 2m) iA AN/i + 1; Naj 2 ) 

'° ° \AF 1 F 1 (4Ar/i;4A M + l/2;A f j 7 2 ) 

= ni 2 iA (4A^; 4A^ + 3/2; Mx 7 2 ) 
^ 2 1j F 1 (47V j u;4A^ + 1/2; Ncrj 2 ) 

Using the generating function, the genetic variance follows directly: 

<91og(Z ) 2 
Wo = 2gNa = 2nA^7 (B6) 

Similarly, the expectation of the genetic variability results in 

{u) ° = d} mr = 2n (* ( °)[ 4 ^] - iog(4)) + 

^(0,1.0) (^ NjJL . mfx + y 2 . + ^(1.0.0) ^ + ^ 

" 1 F 1 {AN f i;AN^ + 1/2; Najl) (B ?) 

The function 'J'(fc) is the polygamma function (k'th derivative of the log-gamma 
function), and this as well as the derivatives of the hypergeometric functions lack 
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a closed form, and are special functions themselves. Fortunately, in most cases 
these are readily implemented in numerical packages. See however Abramowitz and 
Stegun (1972, Chs. 6 and 13) for their definitions in integral or series forms, and 
their properties. Other terms of higher order like the variances and covariances 
follow similarly. However, they involve complicated and long polynomial forms on 
the hypcrgcometrics, and writing them down is not particularly useful. However, 
all the calculations performed in this research are implemented in a Mathcmatica 
(v7.0) package supplied in the ESM 2, which includes a brief tutorial (ESM 3). 
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